Phonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker System

نویسندگان

  • Zeeshan Bhatti
  • Ahmad Waqas
  • Imdad Ali Ismaili
  • Dil Nawaz Hakro
  • Waseem Javaid
چکیده

This paper presents a novel combinational phonetic algorithm for Sindhi Language, to be used in developing Sindhi Spell Checker which has yet not been developed prior to this work. The compound textual forms and glyphs of Sindhi language presents a substantial challenge for developing Sindhi spell checker system and generating similar suggestion list for misspelled words. In order to implement such a system, phonetic based Sindhi language rules and patterns must be considered into account for increasing the accuracy and efficiency. The proposed system is developed with a blend between Phonetic based SoundEx algorithm and ShapeEx algorithm for pattern or glyph matching, generating accurate and efficient suggestion list for incorrect or misspelled Sindhi words. A table of phonetically similar sounding Sindhi characters for SoundEx algorithm is also generated along with another table containing similar glyph or shape based character groups for ShapeEx algorithm. Both these are first ever attempt of any such type of categorization and representation for Sindhi Language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bangla Phonetic Encoding for Better Spelling Suggestions

We present a phonetic encoding for Bangla that can be used by spelling checkers to provide better suggestions for misspelled words. The encoding is based on the Soundex algorithm, modified to match Bangla phonetics. We start by analyzing Soundex encoding scheme when applied to Bangla. &ext we propose a new encoding that handles the case of Bangla words, including those containing conjuncts. We ...

متن کامل

Misspellings in Responses to Listening Comprehension Questions: Prospects for Scoring based on Phonetic Normalization

Automated scoring systems which evaluate content require robust ways of dealing with form errors. The work presented in this paper is set in the context of scoring learners’ responses to listening comprehension items included in a placement test of German as a foreign language. Based on a corpus of over 3000 responses to 17 questions, by test takers of different language proficiencies, we perfo...

متن کامل

A Novel Binary Spell Checker

In this paper we propose a simple, flexible and efficient hybrid spell checking methodology based upon phonetic matching, supervised learning and associative matching in the AURA neural system. We evaluate our approach against several benchmark spell-checking algorithms for recall accuracy. Our proposed hybrid methodology has the joint highest top 10 recall rate of the techniques evaluated. The...

متن کامل

Soundex Algorithm for Indian Language Based on Phonetic Matching

In a system with a large database, there always has been a problem that names may not be spelled well or might not be spelled in a way that one expected. So, data in the database gets degraded. In this case it is required to search the duplicates and merge them in the single entity. In doing so, one problem is that the way in which the strings would be compared. In such cases rather than lookin...

متن کامل

Design and Implementation of Punjabi Spell Checker

Spellcheckers are the basic tools needed for word processing and document preparation. Designing a spell checker for Indian languages such as Punjabi poses many new challenges not found in English, which complicates the design of the spell checker. Punjabi language is far different from Western languages in phonetic properties and grammatical rules. Thus the existing algorithms and techniques t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1405.3033  شماره 

صفحات  -

تاریخ انتشار 2014